When the network has no activation functions, it means that each layer is just a linear map (doesn't contain any strange stuff) so it can't possibly be equal to something that is non-linear.
For the step activation functions, when you multiply the weight, the value scales accordingly, same for the threshold. So if threshold and value scale by the same scalar, the output always remains the same.
Arcs are weights and nodes are neurons.
Considering that the embeddings are taken from the middle layer of this architecture, the layer has the same dimension as our desired output.
For arcs (connections between neurons) we just multiply the input neurons by the middle layer dimension, and then we do the same for the other half of the architecture.
For neurons, its just the fucking neurons.
Important stuff for GANs training is:
If we do matrix multiplication between the matrix and the vector, basically the vector will be reordered.
For example, if we multiply the first row with the vector, we get
In this second exercise, if we do this 3 times we get back to the original. Since 99 is a multiple of 3, it means that after 99 cycles we will be back to the original.
Notice that we can discard all the results that have been scaled, because we know that we don't have any scaling.
Cell State: The cell state, also known as the long-term memory, is a kind of "conveyor belt" that runs through the entire LSTM cell. It carries information that can be added to or removed from the cell state, regulated by structures called gates. The cell state can remember values over arbitrary time intervals, making it crucial for tasks that require memory of past information like time series prediction, text generation, and more.
Forget Gate: The forget gate decides what information should be discarded from the cell state. Like the input gate, it uses a sigmoid activation function to output values between 0 and 1. A value of 0 means "completely forget this component", while a value of 1 means "completely retain this component".
Input Gate: The input gate decides how much of the new information from the current input should be stored in the cell state. It uses a sigmoid activation function to output values between 0 and 1, indicating how much of each component should be let through. A value of 0 means "ignore this component", while a value of 1 means "let this component through".
Output Gate: The output gate decides what the next hidden state (short-term memory) should be. This is based on the current input, the previous hidden state, and the updated cell state. Like the input and forget gates, the output gate uses a sigmoid activation function to control the amount of each component to let through
A scenario is a subset of nodes/neurons.
Scenarios extend the scenario A when they are connected to scenario A. Since nodes in the same layer are not connected, AB doesn't extend A. But AD does.
A feedforward neural network with finite number of neurons/single hidden layer can approximate continuous functions on compact subsets of
This means that with proper weights, a neural network can approximate any function.
The formula states that each point of the approximation is within error from the real function.
Networks with tanh and sigmoid can approximate the same type of functions, since they are both s shaped and similar.
Compute the classification, if it is right do nothing, if it is wrong change the weights
the formula is